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Please replace the paragraph at p. 1, Ins. 22-29 with the following paragraph: 

Currently, most systems do not deal with the problem that the sampling frequency 

might differ considerably between the sending and the receiving side. One possible solution 

proposed in, EP-0680033 A2, works on pitch periods. Adding or removing pitch periods in the 

speech signal achieves a different duration of a speech segment without affecting other speech 

characteristics other than speed. This proposed solution might be used as an indirect sample rate 

conversion method. 



Please replace the paragraph at p. 2, Ins. 1-11 with the following paragraph: 
Another solution uses the beginning of talkspurts as an indication to reset the playout buffer to a 
^ specified level. The distance, in number of samples, between two consecutive talkspurts is 
^ increased if the receiving side is playing faster than the sending side and decreased if the receiving 
side is playing slower than the sending side. In IP-telephony solutions using the IP/UDP/RTP- 
protocols (Internet Protocol/User Datagram Protocol/Real Time Protocol), a marker flag in the 
RTP header is used to identify the beginning of a talkspurt. At the beginning of a talkspurt, the 
playout buffer is set to a suitable size. 



Please replace the paragraph at p. 2, Ins. 12-20 with the following paragraph: 



The solution according to EP-0680033 A2, where pitch periods are removed or inserted, assumes 
a fixed conversion factor between the receiving and transmitting side. Therefore, it cannot be 
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used in dynamic systems, i.e. where the sampling frequencies varies. Further, it does not solve the 
problem with underrun or overrun situations, but is instead focused on changing the playback rate 
of a speech signal stored in compressed form for playback later and at a different speed to that at 
which it was stored. 





Please replace the paragraph at p. 2, Ins. 21-28 with the following paragraph: 



Using the method of resetting the playout buffer to a certain size causes problems if there are very 
long talkspurts, e.g. broadcast from one speaker to several listeners. Since the length of a 
talkspurt is not defined in the beginning of the talkspurt, the size to reset to might be either too 
small or too large. If it is too small, underrun will occur and if it is too large, unnecessary delay is 
introduced. Thus, the problem persists. 



Please replace the paragraph at p. 2, Ins. 29-3 1 with the following paragraph: 



The general problem with the currently known approaches is that they are static and inflexible. 
Therefore, dynamic solutions are required. 



7 



Please replace the paragraph at p. 3, Ins. 8-13 with the following paragraph: 



When sampling frequencies are not controlled, underrun or overrun might occur in the playout 
buffer at the receiving side, which causes audible artifacts in the speech signal. To avoid said 
overrun or underrun there is a need for dynamically keeping the playout buffer to an average size, 
i.e. controlling the fullness of the playout buffer. 
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Please replace the paragraph at p. 3, Ins. 14-16 with the following paragraph: 
One object of the present invention is thus to provide a method for reducing audio artifacts in a 
speech signal due to overrun or underrun in the playout buffer. 

Please replace the paragraph at p. 3, Ins . 17-18 with the following paragraph: 
^ Another object of the invention is to dynamically control the fiallness of the playout buflfer so as 
not to introduce extra delay. 



IV 



0/ 



Please replace the paragraph at p. 3, Ins. 19-29 with the following paragraph: 

The above mentioned and other objects are achieved by means of dynamic sample rate and 
conversion of speech frames, i.e. converting speech frames comprising N samples to instead 
comprise either N+1 or N-1 samples. More specifically, the invention works on an LPC-residual 
of the speech frame. By adding or removing a sample in the LPC-residual, a sample rate 
conversion will be achieved. The LPC- residual is the output from an LPC-filter, which removes 
the short-term correlation from the speech signal. The LPC-filter is a linear predictive coding 
filter where each sample is predicted as a linear combination of previous samples. 

Please replace the paragraph at p.3, Ins. 30-33 through p. 4, Ins. 1-4 with the following paragraph: 
By using the proposed sample rate conversion method, the playout buffer, of e.g. an IP-telephony 
^ ^ terminal, can be continuously controlled with only small audio artifacts. Since the method works 
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( [ on a sample-by-sample basis, the playout buffer can be kept to a minimum and hence no extra 

delay is introduced. The solution also has very low complexity, especially when the LPC-residual 

aw . 



already is available, as is the case in e.g. a speech decoder. 



Please replace the paragraph at p, 4, Ins. 10-13 with the following paragraph: 



Although aspects of the invention have been summarised above, the method and apparatus 
^^-^^ according to the appended claims define the scope of the invention. 



Please replace the paragraph at p. 5, Ins. 5-20 with the following paragraph: 




Referring to FIG. 1, a method for improving speech quality in a communication system includes a 
first terminal unit TRXl transmitting speech signals having a first sample frequency F, and a 
second terminal unit TRX2 receiving said speech signals, buffering them in a playout buffer 100 
with said first frequency and playing out from said playout buffer with a second frequency F^, 
When the buffering frequency is larger than the playout frequency F^^ the playout buffer 100 
v^U eventually be filled with samples and subsequent samples will have to be discarded. When the 
buffering fi*equency Fj is lower than the playout frequency Fj the playout buffer will run into 
starvation, i.e. it will no longer have any samples to play on the output. These two problems are 
called overrun and underrun, respectively, and cause audible artifacts like popping and clicking 
sounds in the speech signal. 



Dallas2 826368 v 1, 34645.005 16USPX 



Docket NO.34645-00516USPX 

Please replace the paragraph at p. 5, Ins. 21-24 with the following paragraph: 
- The above and other problems with underrun and overrun are solved by using dynamic sample 
/ rate conversion based on modifying the LPC-residual of the speech signal and will be further 

described with reference to FIGS. 2-8. 



Please replace the paragraph at p. 6, Ins. 6-14 with the following paragraph: 
By feeding a speech frame through the LPC-filter, H(z), the LPC-residual is found. The 
LPC-residual, shown in FIG. 3, contains pitch pulses P generated by the vocal cords. The 
distance L between two pitch pulses P is called lag. The pitch pulses P are also predictable, and 
since they represent the long-term correlation of the speech signal they are predicted through an 
LTP-filter given by the distance L between the pitch pulses P and the gain 6 of a pitch pulse P. 
The LTP-filter is usually denoted: 

Please replace the paragraph at p. 6, Ins. 16-19 with the foll owing paragraph: 

When the LPC-residual is fed through the inverse of the LTP-filter F(z), an LTP-residual is 
created. In the LTP-residual, the long-term correlation in the LPC-residual is removed, giving the 
LTP-residual a noise-like appearance. 
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Please replace the paragraph at p. 6, Ins. 20-27 through p. 7, Ins. 1-7 with the following paragraph: 
The solution according to the invention modifies the LPC-residual, shown in FIG. 3, on a 
sample-by-sample basis. That is, an LPC-residual block comprising N samples is converted to an 
LPC-residual block comprising either N+1 or N-1 samples. The LPC-residual contains less 




information and less energy compared to the speech signal, but the pitch pulses P are still easy to 
locate. When modifying the LPC-residual, samples that are close to a pitch pulse P should be 



avoided, because these samples contain more information and thus have a large influence on the 
speech synthesis. The LTP-residual is not as suitable as the LPC-residual to use for modification 
since the pitch pulse positions P are no longer available. Thus, the LPC-residual is better suited 
for modification both compared to the speech signal and to the LTP-residual, since the pitch 
pulses P are easily located in the LPC-residual. 



Please replace the paragraph at p. 7, Ins. 8-9 with the following paragraph: 




A sample rate conversion consists of four modules, shown in FIG. 4: 



Please replace the paragraph at p. 7, Ins. 12-13 with the following paragraph: 



^ 2) LPC-Residual Extraction (LRE) modules 410 that are used to obtain the LPC-residual 
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Please replace the paragraph at p. 7, Ins. 14-18 with the following paragraph: 

Sample Rate Conversion Methods (RCM) modules 420 that find the position at which to 
add or remove samples and determine how to perform the insertion and deletion, i.e. 
converting the LPC residual block ru>c comprising N samples to a modified LPC-residual 

block fij>c comprising N+1 or N-1 samples; and 

Please replace the paragraph at p. 7, Ins. 21-23 with the following: 

An idea behind embodiments of the invention is that it is possible to change the playout rate of the 
playout buffer 440 by removing or adding samples in the LPC-residual r^^. 

Please replace the paragraph at p. 7, Ins. 24-27 through p. 8, Ins. 1-1 1 with the following 

paragraph: 

The SRC module 400 decides whether samples should be added or removed in the LPC residual 

^Lpc This is done on the basis of at least one of the four following parameters: the sampling 
fi-equencies of the sending TRXl and receiving terminal units TRx2, information about the speech 
signal e.g. a voice activity detector signal, status of the playout buffer, an indicator of the 
beginning of a talkspurt. The four parameters are designated SRC Inputs in FIG. 4. On the basis 
of a function of one or several of these parameters the SRC 400 decides when to insert or remove 
a sample in the LPC residual r^^c optionally which RCM 420 to use. Since digital processing 
of speech signals usually is made on a frame-by-frame basis, the decision of when to remove or 
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^ 20 add samples basically is to decide within which LPC-residual frame the RCM 420 is to insert 
^^y^ or remove a sample. 

Please replace the paragraph at p. 8, Ins. 12-17 with the following paragraph^ 



There are basically three methods of obtaining the LPC-residual r^c that is needed as input to the 



RCM's 420. The methods depend on the implementation of the speech encoder and will be 
described with reference to FIGS. 5 A-5F. The LRE solution also directly influences the SSM 
solution, which will become apparent below. 



Please replace the paragraph at p. 8, Ins. 19-34 t hrough p.9. Ins. 1-4 with the following paragraph: 
In FIG. 5 A an analysis-by-synthesis speech encoder 500 with LTP-filter 540 is shown. This is a 
hybrid encoder where the vocal tract is described with an LPC-filter 550 and the vocal cords is 
described with an LTP-filter 540, while the LTP-residual r ^c^"^ is waveform-compared with a 
set of more or less stochastic codebook vectors fi-om a fixed codebook 530. The input signal S is 
divided into frames 510 with a typical length of 10-30 ms. For each frame the LPC-filter 550 is 
calculated through an LPC-analysis 520 and the LPC-filter 550 is included in a closed loop to find 
the parameters of the LTP-filter 540. The speech decoder 580 is included in the encoder and 
consists of the fixed codebook 530, whose output P ipc^""^ >s connected to the LTP-filter 540, 
whose output P u>c ^""^ is connected to the LPC-filter 550, which generates an estimate s(n) of 
the original speech signal s(n). Each estimated signal s(n) is compared with the original speech 
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signal s(n) and a difference signal e(n) is calculated. The difference signal e(u) is then weighted by 
an error- weighting block 560 to calculate a perceptual weighted error measure ejn). The set of 
parameters that gives the least perceptual weighted error measure eJn) is transmitted to a 
receiving side 570. 

Please replace the paragraph at p. 9, Ins. 6-12 with the following paragraph: 
As can be seen in FIG. 5C^ the LPC-residual f is the output from the LTP-filter 540. 
^ ^ SRC/RCM modules 545 can be connected directly to the output of the LTP-filler 540 and 

integrated into the speech encoder. An LRE consists of the fixed codebook 530 and the long-term 
predictor 540 and the SSM consists of an LPC-fiUer 550, thus the LRE-module and the 
SSM-module are natural parts of the speech decoder. 



Please replace the paragraph at p. 9, Ins. 13-27 with the following paragraph: 
If the speech encoder, on the other hand, is an analysis-by-synthesis speech encoder where the 
LTP-filter 540 is exchanged to an adaptive codebook 590 as shown in FIG. 5B, the LPCresidual 
LPC(n) is the output from the sum of the adaptive and the fixed codebooks 590 and 530. All 
other elements have the same fimction as in FIG. 5 A which shows an analysis-by-synthesis speech 
encoder with LTP-filter 500. As can be seen in FIG. 5D the LPC residual r is the sum of 

the output from the adaptive and fixed codebook 590 and 530. The SRC/RCM modules 545 can 
thus again be connected directly to that output and integrated into the speech encoder as shown in 
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FIG. 5D. The LRE consists of the adaptive and the fixed codebook 590 and 530 and the SSM 
consists of an LPC-filter 550, thus the LRE module and the SSM module are again natural parts 
of the speech decoder. 




Please replace the paragraph at p, 9, Ins. 28-33 through p. 10, Ins. 1-4 with the following 

paragraph: 

When the speech encoder has some sort of backward adaptation, it is not feasible to make 
— alterations in the LPC-residual since this would affect the adaptation process in a detrimental way. 
^ In FIG. 5E is shown how in these cases the parameters s(n) from the LPC-fiher 550 can be fed to 
an inverse LPC-filter 525 placed after the speech decoder. After the sample rate conversion has 
been made in the SRC/RCM modules 545 an LPC-filtering 550 is performed to reproduce the 
speech signal. The LRE module consists of the inverse LPC-filter 525 and the SSM module 
consists of the LPC-filter 550. 



Please replace the paragraph at p. 10, Ins. 5-15 with the following paragraph: 



In FIG. 5F it is shown how it is possible to produce an LPC residual f u>c ^""^ through a fill! LPC 
analysis. The output s(n) from the speech decoder is fed to both an LPC analysis block 520 and 
an LPC-inverse filter 525. After the sample rate conversion has been made in the SRC/RCM 
modules 545, an LPC filtering 550 is performed to reproduce the speech signal. The LRE 
consists in this case of the LPC analysis 520 respective the LPC inverse filter 525 and the SSM 
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module consists of the LPC filter 550. Performing an LPC analysis is considered to be well 
^ known to a person skilled in the art and is therefore not discussed any ftirther. 

OffVW 

Please replace the paragraph at p. 10, Ins. 16-23 with the following paragraph: 




Referring again to FIG. 4, assume that the SRC-module 400 has decided that a sample should be 
added or removed in the LPC residual and that the LRE module 410 has produced an LPC 
residual r^^. The RCM-module 420 then has to find the exact position in the LPC-residual r^^ 
where to add or remove a sample and performing the adding respective removing. There are four 
different methods for the RCM-module 420 to find the insertion or deletion point. 



Please replace the paragraph at p. 10, Ins. 24-28 with the following paragraph: 



The first and most primitive method arbitrarily removes or adds a sample whenever this becomes 

JO 

necessary. If the sample rate difference between the terminals is small this will only lead to minor 
artifacts since the adding or removing is performed very seldom. 



Please replace the paragraph at p. 11, Ins. 17-26 with the following paragraph: 



The fourth method, illustrated in FIG. 6, uses knowledge about the position P of a pitch pulse, 
and the lag L between two pitch pulses. With this knowledge, it is possible to calculate a position 
P' having low energy at which it is therefore appropriate to add or remove a sample. The new 
position P ' can be expressed as P ' = P-^hL, wherein the constant k is selected so that P ' is 
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selected to be somewhere in the middle between two pitch pulses, thus avoiding positions with 
high energy. A typical value of k is in the range of 0.5 to 0.8. 



Please replace the paragraph at p. 11, Ins. 27-3 1 with the following paragraph: 
When the RCM-module 420 has calculated the position at which to add or remove a sample it 
must be determined how to perform the insertion or deletion. There are three methods of 
performing such insertion or deletion depending on the type of LRE-module used. 



Please replace the paragraph at p. 12, Ins. 1-7 with the following paragraph: 




In the first method, either zeros are added or samples with small amplitudes are removed. This 
method can be used for all LRE solutions described above. (See FIGS. 5C-5F,) Notice that in 
FIGS. 5C and 5D the SRC/RCM-modules are placed before the synthesis filter SSM, but after the 
feed back of the LPC residual to the LTP-filter 540 respective the adaptive codebook 590. 

Please replace the paragraph at p. 12, Ins. 8-15 with the following paragraph: 
In the second method, insertion is carried out by adding zeros and interpolating surrounding 
samples. Deletion is performed by removing samples and preferably smoothing surrounding 
samples. This method can also be used for all of the LRE solutions described above. (See FIGS. 
5C-5F.) Notice that in FIGS. 5C and 5D the SRC/RCM-modules are placed before the synthesis 
filter SSM, but after the feed back of the LPC residual to the LTP-filter 540 respective the 
adaptive codebook 590. 
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Please replace the paragraph at p. 12, Ins. 16-25 with the following paragraph: 




^5 



In the third method, the SRC/RCM-modules 545 are placed within the feedback loop of the 
speech decoder instead of after the feedback loop as in the previous methods. (See FIGS. 5G-5J.) 
Placing the SRC/RCM-modules within the feedback loop uses real LPC residual samples for the 
sample rate conversion, by changing the number of components in the LPC-residual. The 
implementation differs depending on whether it is an analysis-by-synthesis speech encoder with 
LTP filter shown in FIG. 5 A or an analysis-by-synthesis speech encoder with adaptive codebook 
shown in FIG. 5B that is used. 



Please replace the paragraph at p. 12, Ins. 26-33 through p. 13, Ins. 1-2 with the following 
paragraph: 

For the speech decoder with LTP filter (see FIG 5 A) the SRC/RCM-modules 545 can be placed 
* within the feedback loop in two different ways, either within the LTP feedback loop as shown in 
/ Ji / FIG. 5G or in the output from the fixed codebook 530 as shown in FIG. 5H. For the speech 

decoder with adaptive codebook (see FIG. 5B) the SRC/RCM can also be placed in two different 
ways, i.e. either before (FIG, 5 J) or after, FIG. 51, the summation of the outputs from the 
adaptive and the fixed codebook. 

Please replace the paragraph at p. 13, Ins. 3-21 with the following paragraph: 

The alterations on the LPC residual consists of removing or adding samples just as before but 
//y since the SRC/RCM-modules 545 are placed within the LTP feedback loop, some modifications 
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